$$ \newcommand{\mat}[1]{\boldsymbol {#1}} \newcommand{\mattr}[1]{\boldsymbol {#1}^\top} \newcommand{\matinv}[1]{\boldsymbol {#1}^{-1}} \newcommand{\vec}[1]{\boldsymbol {#1}} \newcommand{\vectr}[1]{\boldsymbol {#1}^\top} \newcommand{\rvar}[1]{\mathrm {#1}} \newcommand{\rvec}[1]{\boldsymbol{\mathrm{#1}}} \newcommand{\diag}{\mathop{\mathrm {diag}}} \newcommand{\set}[1]{\mathbb {#1}} \newcommand{\cset}[1]{\mathcal{#1}} \newcommand{\norm}[1]{\left\lVert#1\right\rVert} \newcommand{\pderiv}[2]{\frac{\partial #1}{\partial #2}} \newcommand{\bb}[1]{\boldsymbol{#1}} \newcommand{\E}[2][]{\mathbb{E}_{#1}\left[#2\right]} \newcommand{\ip}[3]{\left<#1,#2\right>_{#3}} \newcommand{\given}[]{\,\middle\vert\,} \newcommand{\DKL}[2]{\cset{D}_{\text{KL}}\left(#1\,\Vert\, #2\right)} \newcommand{\grad}[]{\nabla} $$
Part 1: Mini-Project¶
In this part you'll implement a small comparative-analysis project, heavily based on the materials from the tutorials and homework.
Guidelines¶
- You should implement the code which displays your results in this notebook, and add any additional code files for your implementation in the
project/directory. You can import these files here, as we do for the homeworks. - Running this notebook should not perform any training - load your results from some output files and display them here. The notebook must be runnable from start to end without errors.
- You must include a detailed write-up (in the notebook) of what you implemented and how.
- Explain the structure of your code and how to run it to reproduce your results.
- Explicitly state any external code you used, including built-in pytorch models and code from the course tutorials/homework.
- Analyze your numerical results, explaining why you got these results (not just specifying the results).
- Where relevant, place all results in a table or display them using a graph.
- Before submitting, make sure all files which are required to run this notebook are included in the generated submission zip.
- Try to keep the submission file size under 10MB. Do not include model checkpoint files, dataset files, or any other non-essentials files. Instead include your results as images/text files/pickles/etc, and load them for display in this notebook.
Object detection on TACO dataset¶
TACO is a growing image dataset of waste in the wild. It contains images of litter taken under diverse environments: woods, roads and beaches.

you can read more about the dataset here: https://github.com/pedropro/TACO
and can explore the data distribution and how to load it from here: https://github.com/pedropro/TACO/blob/master/demo.ipynb
The stable version of the dataset that contain 1500 images and 4787 annotations exist in datasets/TACO-master
You do not need to download the dataset.
Project goals:¶
- You need to perform Object Detection task, over 7 of the dataset.
- The annotation for object detection can be downloaded from here: https://github.com/wimlds-trojmiasto/detect-waste/tree/main/annotations.
- The data and annotationotationotation format is like the COCOAPI: https://github.com/cocodataset/cocoapi (you can find a notebook of how to perform evalutation using it here: https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoEvalDemo.ipynb) (you need to install it..)
- if you need a beginner guild for OD in COCOAPI, you can read and watch this link: https://www.neuralception.com/cocodatasetapi/
What do i need to do?¶
- Everything is in the game! as long as your model does not require more then 8 GB of memory and you follow the Guidelines above.
What does it mean?¶
- you can use data augmentation, rather take what's implemented in the directory or use external libraries such as https://albumentations.ai/ (notice that when you create your own augmentations you need to change the annotation as well)
- you can use more data if you find it useful (for examples, reviwew https://github.com/AgaMiko/waste-datasets-review)
What model can i use?¶
- Whatever you want! you can review good models for the coco-OD task as a referance: SOTA: https://paperswithcode.com/sota/object-detection-on-coco Real-Time: https://paperswithcode.com/sota/real-time-object-detection-on-coco Or you can use older models like YOLO-V3 or Faster-RCNN
- As long as you have a reason (complexity, speed, preformence), you are golden.
Tips for a good grade:¶
- start as simple as possible. dealing with APIs are not the easiest for the first time and i predict that this would be your main issue. only when you have a running model that learn, you can add learning tricks.
- use the visualization of a notebook, as we did over the course, check that your input actually fitting the model, the output is the desired size and so on.
- It is recommanded to change the images to a fixed size, like shown in here :https://github.com/pedropro/TACO/blob/master/detector/inspect_data.ipynb
- Please adress the architecture and your loss function/s in this notebook. if you decided to add some loss component like the Focal loss for instance, try to show the results before and after using it.
- Plot your losses in this notebook, any evaluation metric can be shown as a function of time and possibe to analize per class.
Good luck!
Implementation¶
TODO: This is where you should write your explanations and implement the code to display the results. See guidelines about what to include in this section.
As of my last update in September 2021, the TACO (Trash Annotations in Context) dataset is a growing image dataset about waste in the environment. Its primary aim is to aid in the development of models for waste detection, classification, and segmentation. Here's a general explanation of the TACO dataset:
Certainly, let's delve deeper into the TACO dataset:
Extended Overview of the TACO Dataset:
Origins and Motivation: As the global community has become more environmentally conscious, the need to identify and manage waste effectively has grown. The TACO dataset emerged from this urgent requirement. It was designed to help the machine learning community focus on creating innovative solutions that can automate the detection, classification, and management of waste in various contexts.
Rich Annotation Structure: Every image in the TACO dataset is accompanied by detailed metadata. This metadata doesn't just label an item as waste; it goes further to categorize the type of waste. For instance, instead of simply tagging an item as 'plastic,' the dataset may specify whether it's a 'plastic bottle,' 'plastic bag,' or some other subtype. Such detailed annotations help in designing algorithms that can recommend specific recycling or disposal methods.
Diverse Environments: The images in the TACO dataset aren't restricted to one particular setting. They span urban streets, beaches, forests, and more. Such variety ensures that models trained on this dataset can recognize waste in multiple environments. This is crucial for real-world applications, as waste is a universal issue, not restricted by geography.
Image Quality and Variability: The dataset contains images of varying quality, from high-resolution photos to potentially blurry or low-light shots. This mimics real-world scenarios where, for instance, a drone or a moving robot might capture imperfect shots while scanning an area for waste.
Extensions and Collaborations: Given its open-source nature, researchers and institutions have been encouraged to expand the TACO dataset. Collaborative efforts can lead to the inclusion of images from new regions, different waste types, or even annotated videos in the future.
Training, Validation, and Testing Sets: Like many structured datasets, TACO is segmented into training, validation, and testing sets. This ensures that models can be trained on one subset, fine-tuned on another, and finally evaluated on a completely unseen set to gauge their real-world performance.
Usability and Integration: To facilitate ease of use, the dataset often comes with tools or scripts that help in visualizing the annotations, splitting the data, or even converting the annotations into formats compatible with popular machine learning frameworks.
Impact and Future Potential: As environmental concerns grow, datasets like TACO become even more critical. They not only pave the way for cutting-edge research but also have practical implications. Models trained on TACO could be deployed in smart cities, by environmental agencies, or integrated into waste management systems to automatically detect and categorize waste, thereby facilitating more efficient recycling and disposal methods.
In summary, the TACO dataset serves as a cornerstone for researchers and developers working towards creating solutions for environmental waste detection and management. It provides a rich set of annotated images that cover a wide range of waste types and scenarios.
import json
import matplotlib.pyplot as plt
from collections import defaultdict
import pandas as pd
import seaborn as sns
sns.set()
annotations_path = "data/annotations.json"
# Load the TACO annotations
with open(annotations_path, "r") as f:
taco_data = json.load(f)
print("Number of categories:", len(taco_data["categories"]))
print("Number of annotations:", len(taco_data["annotations"]))
print("Number of images:", len(taco_data["images"]))
Number of categories: 60 Number of annotations: 4784 Number of images: 1500
def get_image(img_id):
for img in taco_data["images"]:
if img["id"] == img_id:
return img
return None
bbox_areas = []
for annotation in taco_data["annotations"]:
img = get_image(annotation["image_id"])
bbox_area = annotation["bbox"][2] * annotation["bbox"][3]
img_area = img["width"] * img["height"]
bbox_areas.append(bbox_area / img_area)
plt.figure(figsize=(15, 7))
plt.yscale("log")
plt.hist(bbox_areas, bins=100)
plt.title("Bounding box area relative to image area", fontsize=20)
plt.ylabel("Number of annotations (log scale)", fontsize=15)
plt.xlabel("Bounding box area / Image area", fontsize=15)
plt.show()
# Extract image resolutions
resolutions = [(image['width'], image['height']) for image in taco_data['images']]
# Count occurrences of each resolution
resolution_counts = defaultdict(int)
for resolution in resolutions:
resolution_counts[resolution] += 1
# Prepare data for plotting
unique_resolutions = list(resolution_counts.keys())
counts = list(resolution_counts.values())
labels = [f"{w}x{h}" for w, h in unique_resolutions]
# Plot
plt.figure(figsize=(15, 7))
plt.bar(labels, counts)
plt.xticks(rotation=45, ha="right")
plt.ylabel('Number of Images')
plt.title('Number of Images per Image Shape in TACO')
plt.tight_layout()
plt.show()
# Extract category information
categories = {}
super_categories = {}
for item in taco_data['categories']:
categories[item['id']] = item['name']
super_categories
# Count instances for each category
class_counts = {}
for annotation in taco_data['annotations']:
category_id = annotation['category_id']
class_name = categories[category_id]
class_counts[class_name] = class_counts.get(class_name, 0) + 1
# Plot
plt.figure(figsize=(15, 7))
plt.bar(class_counts.keys(), class_counts.values())
plt.xticks(rotation=45, ha='right')
plt.xlabel('Classes')
plt.ylabel('Number of Instances')
plt.title('Distribution of Classes in TACO Dataset')
plt.tight_layout()
plt.show()
widths = []
heights = []
shape_freqs = []
img_shapes_keys = {}
for img in taco_data['images']:
key = str(img['width'])+'-'+str(img['height'])
if key in img_shapes_keys:
shape_id = img_shapes_keys[key]
shape_freqs[shape_id] += 1
else:
img_shapes_keys[key] = len(widths)
widths.append(img['width'])
heights.append(img['height'])
shape_freqs.append(1)
d ={'Image width (px)': widths, 'Image height (px)': heights, '# images': shape_freqs}
df = pd.DataFrame(d)
cmap = sns.cubehelix_palette(dark=.1, light=.6, as_cmap=True)
plt.figure(figsize=(12,7))
plot = sns.scatterplot(x="Image width (px)", y="Image height (px)", size='# images', hue="# images", palette = cmap,data=df, sizes=(20, 200))
plot = plot.set_title('Number of images per image shape',fontsize=15)
# Extract category names
picked_categories_names = [
"Clear plastic bottle",
"Styrofoam piece",
"Plastic film",
"Pop tab",
"Cigarette",
"Aluminium foil",
"Unlabeled litter",
]
picked_categories_ids = [
cat_id for cat_id, name in categories.items() if name in picked_categories_names
]
picked_categories_ids
[0, 5, 36, 50, 57, 58, 59]
Dataset Split:¶
Training Set (60-80%): A majority of the data is reserved for training. This is where the model learns the patterns and features. Validation Set (10-20%): Used to tune hyperparameters and to prevent overfitting. Feedback from this set helps us adjust the model during training. Test Set (10-20%): To evaluate the model's performance on entirely unseen data. It's essential we never use this data during the training or tuning process. 2. Ensure Representation:
When splitting the data, we should ensure all classes are well-represented in training, validation, and test sets. Especially in object detection, some classes might be underrepresented, so a stratified sampling method should be considered. 3. Avoid Data Leakage:
We need to ensure that there's no overlap between the training, validation, and test sets. Data leakage can give an overly optimistic evaluation of a model's performance. 4. Temporal or Logical Separation (if applicable):
For datasets with temporal aspects (e.g., video frames), it's crucial to avoid mixing data from different time periods across sets. If we're training a model to detect objects in videos, we wouldn't want consecutive frames to be in both the training and test datasets, as they're highly correlated. 5. Augmentations and Variations:
If we're using data augmentation (like random crops, flips, color adjustments), we need to ensure that these augmented variations are only in the training set and not in the validation or test sets. The validation and test sets should be as close to "real-world" data as possible.
from project.dataset_split import random_split_dataset
random_split_dataset(annotations_path, picked_categories_ids)
Random Dataset splitting configuration:
TEST SET: 10% Annotations Number:219
VALIDATION SET: 10% Annotations Number:220
TRAIN SET: 80% Annotations Number:1754
{'id': 3242, 'image_id': 1080, 'category_id': 5, 'segmentation': [[1199.0, 1452.0, 1218.0, 1441.0, 1228.0, 1416.0, 1226.0, 1390.0, 1205.0, 1373.0, 1180.0, 1360.0, 1165.0, 1351.0, 1148.0, 1346.0, 1123.0, 1344.0, 1101.0, 1349.0, 1095.0, 1357.0, 1093.0, 1368.0, 1084.0, 1384.0, 1075.0, 1398.0, 1074.0, 1408.0, 1093.0, 1431.0, 1100.0, 1448.0, 1121.0, 1463.0, 1149.0, 1468.0, 1186.0, 1461.0, 1199.0, 1452.0]], 'area': 13902.0, 'bbox': [1074.0, 1344.0, 154.0, 124.0], 'iscrowd': 0}
{'id': 1722, 'image_id': 516, 'category_id': 5, 'segmentation': [[3641, 3006, 3395, 2980, 3241, 2887, 3133, 2820, 3088, 2785, 3055, 2757, 2990, 2718, 2881, 2644, 2681, 2508, 2605, 2450, 2587, 2406, 2589, 2369, 2602, 2345, 2621, 2300, 2652, 2249, 2664, 2206, 2684, 2183, 2706, 2139, 2720, 2133, 2741, 2094, 2761, 2073, 2793, 2052, 2848, 2086, 2936, 2172, 2977, 2198, 3068, 2275, 3146, 2342, 3220, 2427, 3294, 2467, 3470, 2623, 3499, 2648, 3566, 2679, 3643, 2739, 3702, 2775, 3768, 2862, 3791, 2879, 3810, 2879, 3832, 2900, 3870, 2938, 3868, 2961, 3846, 3005, 3797, 3066, 3757, 3101, 3718, 3066, 3695, 3067, 3670, 3052, 3671, 3035]], 'area': 492380.0, 'bbox': [2587.0, 2052.0, 1283.0, 1049.0], 'iscrowd': 0}
{'id': 3651, 'image_id': 1181, 'category_id': 5, 'segmentation': [[3893.0, 1566.0, 3901.0, 1565.0, 3922.0, 1553.0, 3935.0, 1542.0, 3946.0, 1533.0, 3955.0, 1524.0, 3958.0, 1519.0, 3969.0, 1516.0, 3978.0, 1513.0, 3984.0, 1507.0, 3990.0, 1504.0, 3993.0, 1501.0, 4003.0, 1497.0, 4010.0, 1498.0, 4016.0, 1502.0, 4026.0, 1509.0, 4038.0, 1511.0, 4049.0, 1512.0, 4061.0, 1512.0, 4076.0, 1513.0, 4085.0, 1514.0, 4089.0, 1516.0, 4091.0, 1523.0, 4097.0, 1525.0, 4102.0, 1525.0, 4108.0, 1526.0, 4119.0, 1529.0, 4130.0, 1532.0, 4139.0, 1526.0, 4146.0, 1517.0, 4152.0, 1505.0, 4155.0, 1494.0, 4155.0, 1483.0, 4154.0, 1478.0, 4151.0, 1474.0, 4127.0, 1464.0, 4123.0, 1459.0, 4117.0, 1457.0, 4112.0, 1462.0, 4109.0, 1460.0, 4103.0, 1450.0, 4094.0, 1440.0, 4085.0, 1434.0, 4077.0, 1432.0, 4060.0, 1432.0, 4037.0, 1434.0, 4022.0, 1435.0, 4008.0, 1435.0, 3992.0, 1439.0, 3984.0, 1443.0, 3970.0, 1451.0, 3959.0, 1456.0, 3950.0, 1459.0, 3941.0, 1459.0, 3916.0, 1457.0, 3898.0, 1458.0, 3882.0, 1460.0, 3867.0, 1463.0, 3856.0, 1467.0, 3846.0, 1474.0, 3841.0, 1481.0, 3840.0, 1485.0, 3840.0, 1490.0, 3840.0, 1494.0, 3835.0, 1499.0, 3831.0, 1503.0, 3830.0, 1507.0, 3831.0, 1515.0, 3833.0, 1523.0, 3838.0, 1526.0, 3842.0, 1528.0, 3845.0, 1532.0, 3846.0, 1537.0, 3842.0, 1540.0, 3842.0, 1546.0, 3844.0, 1554.0, 3848.0, 1561.0, 3853.0, 1563.0, 3858.0, 1566.0, 3867.0, 1566.0, 3881.0, 1566.0, 3893.0, 1566.0]], 'area': 24792.0, 'bbox': [3830.0, 1432.0, 325.0, 134.0], 'iscrowd': 0}
{'id': 1778, 'image_id': 549, 'category_id': 36, 'segmentation': [[512, 246, 620, 334, 495, 473, 380, 471, 320, 419, 389, 334, 475, 259]], 'area': 39118.30059776045, 'bbox': [319.69687, 246.21666, 299.96249, 226.46666], 'iscrowd': 0}
{'id': 1717, 'image_id': 513, 'category_id': 36, 'segmentation': [[603, 117, 959, 76, 1201, 32, 1249, 128, 1224, 250, 1274, 506, 1307, 754, 1351, 1029, 1404, 1419, 1332, 1467, 1381, 1517, 1526, 1455, 1652, 1430, 1786, 1418, 2001, 1434, 2312, 1486, 2596, 1538, 2729, 1576, 2798, 1626, 2858, 1694, 2914, 1738, 3280, 1979, 3352, 2045, 3398, 2087, 3554, 2190, 3699, 2282, 3821, 2379, 3890, 2431, 3964, 2473, 4007, 2495, 4100, 2569, 4149, 2608, 4149, 2836, 4074, 2767, 3955, 2682, 3880, 2617, 3789, 2551, 3656, 2493, 3516, 2449, 3359, 2416, 3241, 2399, 3114, 2337, 2979, 2282, 2834, 2214, 2688, 2154, 2554, 2099, 2407, 2027, 2259, 1962, 2114, 1920, 1980, 1887, 1831, 1831, 1732, 1805, 1650, 1801, 1551, 1812, 1468, 1810, 1386, 1809, 1259, 1795, 1191, 1777, 1148, 1750, 1059, 1731, 990, 1724, 881, 1704, 766, 1671, 739, 1610, 667, 1476, 649, 1341, 640, 1205, 624, 963, 619, 807, 608, 570, 598, 422, 592, 280, 592, 156]], 'area': 2252708.7761265594, 'bbox': [591.6667, 32.047607, 3557.4283000000005, 2803.761993], 'iscrowd': 0}
{'id': 2395, 'image_id': 713, 'category_id': 59, 'segmentation': [[274.0, 2408.0, 276.0, 2398.0, 320.0, 2344.0, 332.0, 2344.0, 330.0, 2354.0, 284.0, 2406.0, 274.0, 2408.0]], 'area': 990.0, 'bbox': [274.0, 2344.0, 58.0, 64.0], 'iscrowd': 0}
{'id': 2529, 'image_id': 803, 'category_id': 5, 'segmentation': [[2448.0, 407.0, 2428.0, 416.0, 2422.0, 461.0, 2418.0, 642.0, 2372.0, 695.0, 2357.0, 716.0, 2419.0, 972.0, 2448.0, 918.0, 2448.0, 407.0]], 'area': 25332.5, 'bbox': [2357.0, 407.0, 91.0, 565.0], 'iscrowd': 0}
{'id': 1911, 'image_id': 610, 'category_id': 58, 'segmentation': [[1799.0, 2094.0, 1765.0, 2067.0, 1765.0, 2110.0, 1791.0, 2128.0]], 'area': 1245.0, 'bbox': [1765.0, 2067.0, 34.0, 61.0], 'iscrowd': 0}
{'id': 4237, 'image_id': 1331, 'category_id': 59, 'segmentation': [[2150, 382, 2197, 402, 2199, 414, 2193, 426, 2147, 409]], 'area': 1298.5, 'bbox': [2147.0, 382.0, 52.0, 44.0], 'iscrowd': 0}
{'id': 642, 'image_id': 191, 'category_id': 58, 'segmentation': [[1483, 2014, 1526, 2042, 1581, 2050, 1577, 2028, 1564, 2022, 1564, 2010, 1500, 1994]], 'area': 3056.9999999999945, 'bbox': [1483.0, 1994.0000000000002, 98.0, 55.99999999999977], 'iscrowd': 0}
{'id': 371, 'image_id': 109, 'category_id': 36, 'segmentation': [[1004, 253, 1010, 161, 1003, 104, 1038, 86, 1103, 95, 1219, 108, 1323, 14, 1333, -1, 1431, 4, 1426, 33, 1486, 72, 1572, 26, 1615, 33, 1709, 33, 1774, 80, 1771, 106, 1790, 111, 1783, 178, 1761, 254, 1743, 287, 1674, 267, 1597, 267, 1572, 373, 1524, 395, 1416, 385, 1314, 401, 1217, 410, 1151, 381, 1094, 306, 1011, 286]], 'area': 225464.0, 'bbox': [1003.0, -1.0, 787.0, 411.0], 'iscrowd': 0}
{'id': 2875, 'image_id': 953, 'category_id': 58, 'segmentation': [[193.0, 1156.0, 202.0, 1138.0, 208.0, 1120.0, 217.0, 1108.0, 233.0, 1103.0, 250.0, 1113.0, 254.0, 1120.0, 253.0, 1128.0, 239.0, 1132.0, 237.0, 1140.0, 236.0, 1150.0, 235.0, 1159.0, 214.0, 1170.0, 203.0, 1169.0, 193.0, 1156.0]], 'area': 2346.5, 'bbox': [193.0, 1103.0, 61.0, 67.0], 'iscrowd': 0}
{'id': 353, 'image_id': 105, 'category_id': 58, 'segmentation': [[1658, 2114, 1655, 2077, 1658, 2053, 1693, 2052, 1734, 2086, 1747, 2118, 1734, 2133, 1690, 2127]], 'area': 5306.5, 'bbox': [1655.0, 2052.0, 92.0, 81.0], 'iscrowd': 0}
{'id': 1488, 'image_id': 420, 'category_id': 36, 'segmentation': [[675, 3420, 1005, 3298, 1220, 3205, 1173, 3094, 1159, 3007, 680, 3205, 321, 3357, 336, 3407, 412, 3508, 460, 3496]], 'area': 176222.0, 'bbox': [321.0, 3007.0, 899.0, 501.0], 'iscrowd': 0}
{'id': 4031, 'image_id': 1244, 'category_id': 59, 'segmentation': [[350.0, 1344.0, 368.0, 1314.0, 358.0, 1310.0, 340.0, 1340.0, 350.0, 1344.0]], 'area': 372.0, 'bbox': [340.0, 1310.0, 28.0, 34.0], 'iscrowd': 0}
{'id': 2132, 'image_id': 663, 'category_id': 36, 'segmentation': [[1165, 1376, 1208, 1309, 1238, 1264, 1285, 1198, 1324, 1150, 1364, 1109, 1474, 1029, 1478, 1015, 1494, 1010, 1497, 1041, 1532, 1042, 1541, 1034, 1529, 1019, 1531, 993, 1541, 978, 1560, 988, 1575, 976, 1586, 967, 1600, 995, 1606, 1011, 1620, 1000, 1644, 1002, 1640, 1012, 1678, 1011, 1689, 1018, 1701, 1027, 1710, 1055, 1696, 1059, 1691, 1071, 1624, 1066, 1635, 1094, 1655, 1120, 1660, 1141, 1674, 1163, 1666, 1209, 1652, 1210, 1654, 1276, 1648, 1338, 1633, 1401, 1587, 1534, 1196, 1533, 1177, 1513, 1153, 1513, 1116, 1504, 1118, 1472, 1135, 1457, 1147, 1441, 1158, 1403]], 'area': 205172.0, 'bbox': [1116.0, 967.0, 594.0, 567.0], 'iscrowd': 0}
{'id': 2618, 'image_id': 844, 'category_id': 58, 'segmentation': [[1844.0, 2384.0, 1838.0, 2404.0, 1854.0, 2413.0, 1845.0, 2442.0, 1862.0, 2444.0, 1873.0, 2441.0, 1916.0, 2470.0, 1931.0, 2492.0, 1957.0, 2504.0, 1952.0, 2510.0, 1945.0, 2526.0, 1964.0, 2547.0, 2007.0, 2569.0, 2022.0, 2572.0, 2009.0, 2546.0, 1993.0, 2526.0, 2009.0, 2527.0, 2056.0, 2520.0, 2093.0, 2520.0, 2128.0, 2524.0, 2121.0, 2498.0, 2077.0, 2458.0, 2041.0, 2436.0, 2025.0, 2422.0, 1968.0, 2460.0, 1974.0, 2464.0, 1966.0, 2471.0, 1959.0, 2463.0, 1967.0, 2401.0, 1976.0, 2383.0, 1963.0, 2321.0, 1949.0, 2314.0, 1932.0, 2320.0, 1912.0, 2317.0, 1891.0, 2333.0, 1885.0, 2350.0, 1861.0, 2366.0, 1844.0, 2384.0]], 'area': 29176.0, 'bbox': [1838.0, 2314.0, 290.0, 258.0], 'iscrowd': 0}
{'id': 3756, 'image_id': 1213, 'category_id': 58, 'segmentation': [[279.0, 1841.0, 311.0, 1876.0, 334.0, 1839.0, 340.0, 1827.0, 322.0, 1826.0, 308.0, 1825.0, 291.0, 1826.0, 278.0, 1828.0, 279.0, 1841.0]], 'area': 1808.5, 'bbox': [278.0, 1825.0, 62.0, 51.0], 'iscrowd': 0}
{'id': 4539, 'image_id': 1425, 'category_id': 36, 'segmentation': [[1074, 3176, 1086, 3105, 1156, 3064, 1216, 3039, 1236, 3060, 1249, 3089, 1263, 3096, 1262, 3123, 1248, 3191, 1201, 3215, 1198, 3186, 1212, 3177, 1216, 3126, 1233, 3101, 1222, 3067, 1171, 3102, 1105, 3131, 1121, 3231, 1107, 3231, 1101, 3185]], 'area': 12810.999999999987, 'bbox': [1074.0, 3038.9999999999995, 189.0, 192.0], 'iscrowd': 0}
{'id': 4107, 'image_id': 1316, 'category_id': 5, 'segmentation': [[1384, 741, 1398, 705, 1409, 692, 1435, 693, 1470, 714, 1523, 718, 1558, 719, 1585, 740, 1597, 750, 1613, 752, 1606, 788, 1580, 786, 1561, 795, 1541, 799, 1516, 791, 1462, 770, 1432, 774, 1408, 768, 1393, 758]], 'area': 14507.0, 'bbox': [1383.8096, 692.1428, 229.0, 107.0], 'iscrowd': 0}
{'id': 2748, 'image_id': 816, 'category_id': 59, 'segmentation': [[836.0, 1580.0, 826.0, 1516.0, 802.0, 1520.0, 816.0, 1588.0, 836.0, 1580.0]], 'area': 1524.0, 'bbox': [802.0, 1516.0, 34.0, 72.0], 'iscrowd': 0}
...
Now lets create image-label folder structure:
from project.coco_to_yolo import coco_to_yolo
# Uncomment this code to prepare the dataset for training
""""""
coco_to_yolo("data/annotations_train.json", "yolo_dataset", "train")
coco_to_yolo("data/annotations_test.json", "yolo_dataset", "test")
coco_to_yolo("data/annotations_val.json", "yolo_dataset", "valid")
Now lets is some limitations and potential issues with the TACO dataset:
Size: The dataset isn't as large as some other standard datasets. For deep learning models, especially convolutional neural networks (CNNs), larger datasets can be more beneficial.
Imbalance: Like many real-world datasets, TACO might suffer from class imbalance, where some trash items are far more frequent than others. This can lead to models that are biased towards detecting frequent items and perform poorly on rare items.
Varying Quality: As the dataset is collected from diverse sources, there might be inconsistencies in image quality, resolution, and lighting conditions.
Complexity: Real-world scenarios introduce complexities like occlusions, varying perspectives, and background clutter. This might make some images in TACO particularly challenging.
Generalization: While TACO aims at a broad representation, it might not cover all possible types or contexts of trash. Models trained solely on TACO might not generalize well to some unrepresented scenarios.
Annotation Errors: As with any manually annotated dataset, there might be occasional errors or inconsistencies in bounding boxes or labels.
Conclusion:¶
The TACO dataset offers a valuable resource for training waste detection models in real-world scenarios. While it has its limitations, leveraging it in combination with other datasets or using data augmentation techniques can help in developing more robust and accurate waste detection systems. If using TACO for research or application development, we are aware of its potential pitfalls and will design our experiments and evaluations accordingly.
Data Enhancement Techniques - Leveraging data augmentation is a well-established method for training a model that can generalize effectively across varied conditions. By augmenting our dataset, we not only expanded its size but also fortified the model against various conditions, vital for our data set's peculiarities. Common augmentative processes we utilized include image flipping, cropping, rotation, resizing, introducing random erasures, and modifying image brightness and contrast, as informed by the study "Augmentation for Small Object Detection" (source).
Certain augmentations were particularly crucial. For instance, random resizing was instrumental in enhancing detection of smaller objects in our dataset, while random cropping addressed class imbalances effectively.
Additionally, we employed a less conventional "copy-paste" augmentation approach. Given an image's object segmentation mask, this method involves isolating select objects, especially those underrepresented in the dataset, and embedding them in random locations within another image. Such an approach aids in rectifying class imbalances and further bolstering model resilience across varied conditions. The method was inspired by the study titled "Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation" (source).
Illustrations of our Data Enhancement Techniques:
from project.augmentation import random_augment_images, augment_sample
random_samples = random_augment_images("yolo_dataset/test/images",10)
for sample in random_samples:
augment_sample(sample)
Evaluation Metrics:¶
Precision:
- Definition: Precision measures how many of the items identified as positive are actually positive. It is the ratio of correctly predicted positive instances to the total predicted positives.
- Formula: $Precision = \frac{True \ Positives}{True \ Positives + False \ Positives}$
- Use-case: Precision is particularly important in situations where false positives have a high cost. For example, in spam email detection, it's preferable not to mislabel a legitimate email as spam.
Recall (or Sensitivity or True Positive Rate):
- Definition: Recall measures how many of the actual positive cases were identified correctly. It is the ratio of correctly predicted positive instances to all actual positive instances.
- Formula: $Recall = \frac{True \ Positives}{True \ Positives + False \ Negatives}$
- Use-case: Recall is crucial in situations where missing a positive instance has a high cost. For instance, in medical diagnoses, it's important to identify as many actual positive cases as possible to avoid missing a disease.
Mean Average Precision (mAP):
- Definition: mAP is a popular metric in object detection. For each class, an average precision (AP) is computed as the area under the precision-recall curve. mAP is then the average of AP over all classes. It takes into account both false positives (affecting precision) and false negatives (affecting recall).
- Use-case: Object detection tasks, especially when there are multiple classes. It provides a single score that balances precision and recall across all classes, making it a comprehensive metric for multi-class object detection.
IoU (Intersection over Union):
- Definition: IoU measures the overlap between two boundaries. Specifically, in object detection and segmentation, it's the overlap between the predicted bounding box (or segmentation area) and the ground truth, divided by the union of the two.
- Formula: $IoU = \frac{Area \ of \ Overlap}{Area \ of \ Union}$
- Use-case: Widely used in object detection and segmentation tasks. It provides a clear measure of how well the predicted bounding box or segmentation area matches the ground truth. Often, a threshold (e.g., 0.5) is set for IoU to decide if a prediction is a true positive or a false positive.
Understanding these metrics and choosing the right one is critical. While Precision and Recall are general metrics used across many tasks, mAP and IoU are more specific to object detection and segmentation. The appropriate metric often depends on the specific requirements and consequences of false positives and false negatives in the given application.
An Introduction to YOLOv8¶
Ultralytics has introduced YOLOv8 as an enhanced real-time object detection system. Being the 8th iteration in the YOLO series, it boasts improvements over its predecessors, especially in aspects like speed, precision, and efficiency.
Remarkably, YOLOv8 has achieved a Mean Average Precision (mAP) of 53.9, setting a new benchmark in the YOLO series.
Developed in PyTorch, YOLOv8 is versatile, capable of running on both CPUs and GPUs. Its efficiency extends to supporting diverse formats, including TF.js and coreML. Drawing parallels with YOLOv7, it's adept at tasks ranging from object detection and image classification to segmentation.
Key Innovations in YOLOv8:
The model incorporates a refreshed backbone network, an anchor-free detection head, and an innovative loss function. These features make YOLOv8 an ideal pick for an array of object detection and segmentation assignments.
YOLOv8's Diverse Suite:
Within YOLOv8, five distinct models are offered for detection, classification, and segmentation. At one end of the spectrum, YOLOv8 Nano stands out for its swift performance and compact size. On the other hand, YOLOv8x is recognized for its unparalleled accuracy, even if it trades off some speed compared to its counterparts.
Key Features of YOLOv8¶
YOLOv8 introduces both advancements in architecture and enhancements for developers.
When juxtaposed with its predecessor, YOLOv5, the YOLOv8 boasts of:
The introduction of an anchor-free detection mechanism. Modifications to the convolutional segments within the architecture. Implementation of Mosaic augmentation during the training phase, which is deactivated in the final 10 epochs. Moreover, YOLOv8 has incorporated changes that elevate the developer's experience. For starters, it is now available as a library, allowing for easy integration into Python projects. Simply executing "pip install ultralytics" provides access to the model.
yolov8s¶
from project.yolo_train import show_model
show_model("yolov8s", 100)
Layer (type) Output Shape Param # -------------------------------------------------------------------------------- model.0.conv.weight [32, 3, 3, 3] 864 model.0.bn.weight torch.Size([32]) 32 model.0.bn.bias torch.Size([32]) 32 model.1.conv.weight [64, 32, 3, 3] 18432 model.1.bn.weight torch.Size([64]) 64 model.1.bn.bias torch.Size([64]) 64 model.2.cv1.conv.weight [64, 64, 1, 1] 4096 model.2.cv1.bn.weight torch.Size([64]) 64 model.2.cv1.bn.bias torch.Size([64]) 64 model.2.cv2.conv.weight [64, 96, 1, 1] 6144 model.2.cv2.bn.weight torch.Size([64]) 64 model.2.cv2.bn.bias torch.Size([64]) 64 model.2.m.0.cv1.conv.weight [32, 32, 3, 3] 9216 model.2.m.0.cv1.bn.weight torch.Size([32]) 32 model.2.m.0.cv1.bn.bias torch.Size([32]) 32 model.2.m.0.cv2.conv.weight [32, 32, 3, 3] 9216 model.2.m.0.cv2.bn.weight torch.Size([32]) 32 model.2.m.0.cv2.bn.bias torch.Size([32]) 32 model.3.conv.weight [128, 64, 3, 3] 73728 model.3.bn.weight torch.Size([128]) 128 model.3.bn.bias torch.Size([128]) 128 model.22.cv2.2.0.conv.weight [64, 512, 3, 3] 294912 model.22.cv2.2.0.bn.weight torch.Size([64]) 64 model.22.cv2.2.0.bn.bias torch.Size([64]) 64 model.22.cv2.2.1.conv.weight [64, 64, 3, 3] 36864 model.22.cv2.2.1.bn.weight torch.Size([64]) 64 model.22.cv2.2.1.bn.bias torch.Size([64]) 64 model.22.cv2.2.2.weight [64, 64, 1, 1] 4096 model.22.cv2.2.2.bias torch.Size([64]) 64 model.22.cv3.0.0.conv.weight [128, 128, 3, 3] 147456 model.22.cv3.0.0.bn.weight torch.Size([128]) 128 model.22.cv3.0.0.bn.bias torch.Size([128]) 128 model.22.cv3.0.1.conv.weight [128, 128, 3, 3] 147456 model.22.cv3.0.1.bn.weight torch.Size([128]) 128 model.22.cv3.0.1.bn.bias torch.Size([128]) 128 model.22.cv3.0.2.weight [80, 128, 1, 1] 10240 model.22.cv3.0.2.bias torch.Size([80]) 80 model.22.cv3.1.0.conv.weight [128, 256, 3, 3] 294912 model.22.cv3.1.0.bn.weight torch.Size([128]) 128 model.22.cv3.1.0.bn.bias torch.Size([128]) 128 model.22.cv3.1.1.conv.weight [128, 128, 3, 3] 147456 model.22.cv3.1.1.bn.weight torch.Size([128]) 128 model.22.cv3.1.1.bn.bias torch.Size([128]) 128 model.22.cv3.1.2.weight [80, 128, 1, 1] 10240 model.22.cv3.1.2.bias torch.Size([80]) 80 model.22.cv3.2.0.conv.weight [128, 512, 3, 3] 589824 model.22.cv3.2.0.bn.weight torch.Size([128]) 128 model.22.cv3.2.0.bn.bias torch.Size([128]) 128 model.22.cv3.2.1.conv.weight [128, 128, 3, 3] 147456 model.22.cv3.2.1.bn.weight torch.Size([128]) 128 model.22.cv3.2.1.bn.bias torch.Size([128]) 128 model.22.cv3.2.2.weight [80, 128, 1, 1] 10240 model.22.cv3.2.2.bias torch.Size([80]) 80 model.22.dfl.conv.weight [1, 16, 1, 1] 16 -------------------------------------------------------------------------------- Total params: 11166560 Epochs: 100 Batch size: 16 Optimizer: SGD Cosine learning rate: False Dropout: False IoU: 0.7 Augment: False Learning rate 0: 0.01 Learning rate f: 0.01
yolov8n¶
show_model("yolov8n", 10)
Layer (type) Output Shape Param # -------------------------------------------------------------------------------- model.0.conv.weight [16, 3, 3, 3] 432 model.0.bn.weight torch.Size([16]) 16 model.0.bn.bias torch.Size([16]) 16 model.1.conv.weight [32, 16, 3, 3] 4608 model.1.bn.weight torch.Size([32]) 32 model.1.bn.bias torch.Size([32]) 32 model.2.cv1.conv.weight [32, 32, 1, 1] 1024 model.2.cv1.bn.weight torch.Size([32]) 32 model.2.cv1.bn.bias torch.Size([32]) 32 model.2.cv2.conv.weight [32, 48, 1, 1] 1536 model.2.cv2.bn.weight torch.Size([32]) 32 model.2.cv2.bn.bias torch.Size([32]) 32 model.2.m.0.cv1.conv.weight [16, 16, 3, 3] 2304 model.2.m.0.cv1.bn.weight torch.Size([16]) 16 model.2.m.0.cv1.bn.bias torch.Size([16]) 16 model.2.m.0.cv2.conv.weight [16, 16, 3, 3] 2304 model.2.m.0.cv2.bn.weight torch.Size([16]) 16 model.2.m.0.cv2.bn.bias torch.Size([16]) 16 model.3.conv.weight [64, 32, 3, 3] 18432 model.3.bn.weight torch.Size([64]) 64 model.3.bn.bias torch.Size([64]) 64 model.22.cv2.0.2.weight [64, 64, 1, 1] 4096 model.22.cv2.0.2.bias torch.Size([64]) 64 model.22.cv2.1.0.conv.weight [64, 128, 3, 3] 73728 model.22.cv2.1.0.bn.weight torch.Size([64]) 64 model.22.cv2.1.0.bn.bias torch.Size([64]) 64 model.22.cv2.1.1.conv.weight [64, 64, 3, 3] 36864 model.22.cv2.1.1.bn.weight torch.Size([64]) 64 model.22.cv2.1.1.bn.bias torch.Size([64]) 64 model.22.cv2.1.2.weight [64, 64, 1, 1] 4096 model.22.cv2.1.2.bias torch.Size([64]) 64 model.22.cv2.2.0.conv.weight [64, 256, 3, 3] 147456 model.22.cv2.2.0.bn.weight torch.Size([64]) 64 model.22.cv2.2.0.bn.bias torch.Size([64]) 64 model.22.cv2.2.1.conv.weight [64, 64, 3, 3] 36864 model.22.cv2.2.1.bn.weight torch.Size([64]) 64 model.22.cv2.2.1.bn.bias torch.Size([64]) 64 model.22.cv2.2.2.weight [64, 64, 1, 1] 4096 model.22.cv2.2.2.bias torch.Size([64]) 64 model.22.cv3.0.0.conv.weight [80, 64, 3, 3] 46080 model.22.cv3.0.0.bn.weight torch.Size([80]) 80 model.22.cv3.0.0.bn.bias torch.Size([80]) 80 model.22.cv3.0.1.conv.weight [80, 80, 3, 3] 57600 model.22.cv3.0.1.bn.weight torch.Size([80]) 80 model.22.cv3.0.1.bn.bias torch.Size([80]) 80 model.22.cv3.0.2.weight [80, 80, 1, 1] 6400 model.22.cv3.0.2.bias torch.Size([80]) 80 model.22.cv3.1.0.conv.weight [80, 128, 3, 3] 92160 model.22.cv3.1.0.bn.weight torch.Size([80]) 80 model.22.cv3.1.0.bn.bias torch.Size([80]) 80 model.22.cv3.1.1.conv.weight [80, 80, 3, 3] 57600 model.22.cv3.1.1.bn.weight torch.Size([80]) 80 model.22.cv3.1.1.bn.bias torch.Size([80]) 80 model.22.cv3.1.2.weight [80, 80, 1, 1] 6400 model.22.cv3.1.2.bias torch.Size([80]) 80 model.22.cv3.2.0.conv.weight [80, 256, 3, 3] 184320 model.22.cv3.2.0.bn.weight torch.Size([80]) 80 model.22.cv3.2.0.bn.bias torch.Size([80]) 80 model.22.cv3.2.1.conv.weight [80, 80, 3, 3] 57600 model.22.cv3.2.1.bn.weight torch.Size([80]) 80 model.22.cv3.2.1.bn.bias torch.Size([80]) 80 model.22.cv3.2.2.weight [80, 80, 1, 1] 6400 model.22.cv3.2.2.bias torch.Size([80]) 80 model.22.dfl.conv.weight [1, 16, 1, 1] 16 -------------------------------------------------------------------------------- Total params: 3157200 Epochs: 20 Batch size: 16 Optimizer: SGD Cosine learning rate: False Dropout: False IoU: 0.7 Augment: False Learning rate 0: 0.01 Learning rate f: 0.01
show_model("yolov8n", 100)
Layer (type) Output Shape Param # -------------------------------------------------------------------------------- model.0.conv.weight [16, 3, 3, 3] 432 model.0.bn.weight torch.Size([16]) 16 model.0.bn.bias torch.Size([16]) 16 model.1.conv.weight [32, 16, 3, 3] 4608 model.1.bn.weight torch.Size([32]) 32 model.1.bn.bias torch.Size([32]) 32 model.2.cv1.conv.weight [32, 32, 1, 1] 1024 model.2.cv1.bn.weight torch.Size([32]) 32 model.2.cv1.bn.bias torch.Size([32]) 32 model.2.cv2.conv.weight [32, 48, 1, 1] 1536 model.2.cv2.bn.weight torch.Size([32]) 32 model.2.cv2.bn.bias torch.Size([32]) 32 model.2.m.0.cv1.conv.weight [16, 16, 3, 3] 2304 model.2.m.0.cv1.bn.weight torch.Size([16]) 16 model.2.m.0.cv1.bn.bias torch.Size([16]) 16 model.2.m.0.cv2.conv.weight [16, 16, 3, 3] 2304 model.2.m.0.cv2.bn.weight torch.Size([16]) 16 model.2.m.0.cv2.bn.bias torch.Size([16]) 16 model.3.conv.weight [64, 32, 3, 3] 18432 model.3.bn.weight torch.Size([64]) 64 model.3.bn.bias torch.Size([64]) 64 model.22.cv2.0.2.weight [64, 64, 1, 1] 4096 model.22.cv2.0.2.bias torch.Size([64]) 64 model.22.cv2.1.0.conv.weight [64, 128, 3, 3] 73728 model.22.cv2.1.0.bn.weight torch.Size([64]) 64 model.22.cv2.1.0.bn.bias torch.Size([64]) 64 model.22.cv2.1.1.conv.weight [64, 64, 3, 3] 36864 model.22.cv2.1.1.bn.weight torch.Size([64]) 64 model.22.cv2.1.1.bn.bias torch.Size([64]) 64 model.22.cv2.1.2.weight [64, 64, 1, 1] 4096 model.22.cv2.1.2.bias torch.Size([64]) 64 model.22.cv2.2.0.conv.weight [64, 256, 3, 3] 147456 model.22.cv2.2.0.bn.weight torch.Size([64]) 64 model.22.cv2.2.0.bn.bias torch.Size([64]) 64 model.22.cv2.2.1.conv.weight [64, 64, 3, 3] 36864 model.22.cv2.2.1.bn.weight torch.Size([64]) 64 model.22.cv2.2.1.bn.bias torch.Size([64]) 64 model.22.cv2.2.2.weight [64, 64, 1, 1] 4096 model.22.cv2.2.2.bias torch.Size([64]) 64 model.22.cv3.0.0.conv.weight [80, 64, 3, 3] 46080 model.22.cv3.0.0.bn.weight torch.Size([80]) 80 model.22.cv3.0.0.bn.bias torch.Size([80]) 80 model.22.cv3.0.1.conv.weight [80, 80, 3, 3] 57600 model.22.cv3.0.1.bn.weight torch.Size([80]) 80 model.22.cv3.0.1.bn.bias torch.Size([80]) 80 model.22.cv3.0.2.weight [80, 80, 1, 1] 6400 model.22.cv3.0.2.bias torch.Size([80]) 80 model.22.cv3.1.0.conv.weight [80, 128, 3, 3] 92160 model.22.cv3.1.0.bn.weight torch.Size([80]) 80 model.22.cv3.1.0.bn.bias torch.Size([80]) 80 model.22.cv3.1.1.conv.weight [80, 80, 3, 3] 57600 model.22.cv3.1.1.bn.weight torch.Size([80]) 80 model.22.cv3.1.1.bn.bias torch.Size([80]) 80 model.22.cv3.1.2.weight [80, 80, 1, 1] 6400 model.22.cv3.1.2.bias torch.Size([80]) 80 model.22.cv3.2.0.conv.weight [80, 256, 3, 3] 184320 model.22.cv3.2.0.bn.weight torch.Size([80]) 80 model.22.cv3.2.0.bn.bias torch.Size([80]) 80 model.22.cv3.2.1.conv.weight [80, 80, 3, 3] 57600 model.22.cv3.2.1.bn.weight torch.Size([80]) 80 model.22.cv3.2.1.bn.bias torch.Size([80]) 80 model.22.cv3.2.2.weight [80, 80, 1, 1] 6400 model.22.cv3.2.2.bias torch.Size([80]) 80 model.22.dfl.conv.weight [1, 16, 1, 1] 16 -------------------------------------------------------------------------------- Total params: 3157200 Epochs: 100 Batch size: 16 Optimizer: Adam Cosine learning rate: False Dropout: False IoU: 0.7 Augment: False Learning rate 0: 0.01 Learning rate f: 0.01
yolov8m¶
show_model("yolov8m", 10)
Layer (type) Output Shape Param # -------------------------------------------------------------------------------- model.0.conv.weight [48, 3, 3, 3] 1296 model.0.bn.weight torch.Size([48]) 48 model.0.bn.bias torch.Size([48]) 48 model.1.conv.weight [96, 48, 3, 3] 41472 model.1.bn.weight torch.Size([96]) 96 model.1.bn.bias torch.Size([96]) 96 model.2.cv1.conv.weight [96, 96, 1, 1] 9216 model.2.cv1.bn.weight torch.Size([96]) 96 model.2.cv1.bn.bias torch.Size([96]) 96 model.2.cv2.conv.weight [96, 192, 1, 1] 18432 model.2.cv2.bn.weight torch.Size([96]) 96 model.2.cv2.bn.bias torch.Size([96]) 96 model.2.m.0.cv1.conv.weight [48, 48, 3, 3] 20736 model.2.m.0.cv1.bn.weight torch.Size([48]) 48 model.2.m.0.cv1.bn.bias torch.Size([48]) 48 model.2.m.0.cv2.conv.weight [48, 48, 3, 3] 20736 model.2.m.0.cv2.bn.weight torch.Size([48]) 48 model.2.m.0.cv2.bn.bias torch.Size([48]) 48 model.2.m.1.cv1.conv.weight [48, 48, 3, 3] 20736 model.2.m.1.cv1.bn.weight torch.Size([48]) 48 model.2.m.1.cv1.bn.bias torch.Size([48]) 48 model.15.m.0.cv1.conv.weight [96, 96, 3, 3] 82944 model.15.m.0.cv1.bn.weight torch.Size([96]) 96 model.15.m.0.cv1.bn.bias torch.Size([96]) 96 model.15.m.0.cv2.conv.weight [96, 96, 3, 3] 82944 model.15.m.0.cv2.bn.weight torch.Size([96]) 96 model.15.m.0.cv2.bn.bias torch.Size([96]) 96 model.15.m.1.cv1.conv.weight [96, 96, 3, 3] 82944 model.15.m.1.cv1.bn.weight torch.Size([96]) 96 model.15.m.1.cv1.bn.bias torch.Size([96]) 96 model.15.m.1.cv2.conv.weight [96, 96, 3, 3] 82944 model.15.m.1.cv2.bn.weight torch.Size([96]) 96 model.15.m.1.cv2.bn.bias torch.Size([96]) 96 model.16.conv.weight [192, 192, 3, 3] 331776 model.16.bn.weight torch.Size([192]) 192 model.16.bn.bias torch.Size([192]) 192 model.18.cv1.conv.weight [384, 576, 1, 1] 221184 model.18.cv1.bn.weight torch.Size([384]) 384 model.18.cv1.bn.bias torch.Size([384]) 384 model.18.cv2.conv.weight [384, 768, 1, 1] 294912 model.18.cv2.bn.weight torch.Size([384]) 384 model.18.cv2.bn.bias torch.Size([384]) 384 model.18.m.0.cv1.conv.weight [192, 192, 3, 3] 331776 model.18.m.0.cv1.bn.weight torch.Size([192]) 192 model.18.m.0.cv1.bn.bias torch.Size([192]) 192 model.18.m.0.cv2.conv.weight [192, 192, 3, 3] 331776 model.18.m.0.cv2.bn.weight torch.Size([192]) 192 model.18.m.0.cv2.bn.bias torch.Size([192]) 192 model.18.m.1.cv1.conv.weight [192, 192, 3, 3] 331776 model.18.m.1.cv1.bn.weight torch.Size([192]) 192 model.18.m.1.cv1.bn.bias torch.Size([192]) 192 model.18.m.1.cv2.conv.weight [192, 192, 3, 3] 331776 model.18.m.1.cv2.bn.weight torch.Size([192]) 192 model.18.m.1.cv2.bn.bias torch.Size([192]) 192 model.19.conv.weight [384, 384, 3, 3] 1327104 model.19.bn.weight torch.Size([384]) 384 model.19.bn.bias torch.Size([384]) 384 model.21.cv1.conv.weight [576, 960, 1, 1] 552960 model.21.cv1.bn.weight torch.Size([576]) 576 model.21.cv1.bn.bias torch.Size([576]) 576 model.21.cv2.conv.weight [576, 1152, 1, 1] 663552 model.21.cv2.bn.weight torch.Size([576]) 576 model.21.cv2.bn.bias torch.Size([576]) 576 model.21.m.0.cv1.conv.weight [288, 288, 3, 3] 746496 model.21.m.0.cv1.bn.weight torch.Size([288]) 288 model.21.m.0.cv1.bn.bias torch.Size([288]) 288 model.21.m.0.cv2.conv.weight [288, 288, 3, 3] 746496 model.21.m.0.cv2.bn.weight torch.Size([288]) 288 model.21.m.0.cv2.bn.bias torch.Size([288]) 288 model.21.m.1.cv1.conv.weight [288, 288, 3, 3] 746496 model.21.m.1.cv1.bn.weight torch.Size([288]) 288 model.21.m.1.cv1.bn.bias torch.Size([288]) 288 model.21.m.1.cv2.conv.weight [288, 288, 3, 3] 746496 model.21.m.1.cv2.bn.weight torch.Size([288]) 288 model.21.m.1.cv2.bn.bias torch.Size([288]) 288 model.22.cv2.0.0.conv.weight [64, 192, 3, 3] 110592 model.22.cv2.0.0.bn.weight torch.Size([64]) 64 model.22.cv2.0.0.bn.bias torch.Size([64]) 64 model.22.cv2.0.1.conv.weight [64, 64, 3, 3] 36864 model.22.cv2.0.1.bn.weight torch.Size([64]) 64 model.22.cv2.0.1.bn.bias torch.Size([64]) 64 model.22.cv2.0.2.weight [64, 64, 1, 1] 4096 model.22.cv2.0.2.bias torch.Size([64]) 64 model.22.cv2.1.0.conv.weight [64, 384, 3, 3] 221184 model.22.cv2.1.0.bn.weight torch.Size([64]) 64 model.22.cv2.1.0.bn.bias torch.Size([64]) 64 model.22.cv2.1.1.conv.weight [64, 64, 3, 3] 36864 model.22.cv2.1.1.bn.weight torch.Size([64]) 64 model.22.cv2.1.1.bn.bias torch.Size([64]) 64 model.22.cv2.1.2.weight [64, 64, 1, 1] 4096 model.22.cv2.1.2.bias torch.Size([64]) 64 model.22.cv2.2.0.conv.weight [64, 576, 3, 3] 331776 model.22.cv2.2.0.bn.weight torch.Size([64]) 64 model.22.cv2.2.0.bn.bias torch.Size([64]) 64 model.22.cv2.2.1.conv.weight [64, 64, 3, 3] 36864 model.22.cv2.2.1.bn.weight torch.Size([64]) 64 model.22.cv2.2.1.bn.bias torch.Size([64]) 64 model.22.cv2.2.2.weight [64, 64, 1, 1] 4096 model.22.cv2.2.2.bias torch.Size([64]) 64 model.22.cv3.0.0.conv.weight [192, 192, 3, 3] 331776 model.22.cv3.0.0.bn.weight torch.Size([192]) 192 model.22.cv3.0.0.bn.bias torch.Size([192]) 192 model.22.cv3.0.1.conv.weight [192, 192, 3, 3] 331776 model.22.cv3.0.1.bn.weight torch.Size([192]) 192 model.22.cv3.0.1.bn.bias torch.Size([192]) 192 model.22.cv3.0.2.weight [80, 192, 1, 1] 15360 model.22.cv3.0.2.bias torch.Size([80]) 80 model.22.cv3.1.0.conv.weight [192, 384, 3, 3] 663552 model.22.cv3.1.0.bn.weight torch.Size([192]) 192 model.22.cv3.1.0.bn.bias torch.Size([192]) 192 model.22.cv3.1.1.conv.weight [192, 192, 3, 3] 331776 model.22.cv3.1.1.bn.weight torch.Size([192]) 192 model.22.cv3.1.1.bn.bias torch.Size([192]) 192 model.22.cv3.1.2.weight [80, 192, 1, 1] 15360 model.22.cv3.1.2.bias torch.Size([80]) 80 model.22.cv3.2.0.conv.weight [192, 576, 3, 3] 995328 model.22.cv3.2.0.bn.weight torch.Size([192]) 192 model.22.cv3.2.0.bn.bias torch.Size([192]) 192 model.22.cv3.2.1.conv.weight [192, 192, 3, 3] 331776 model.22.cv3.2.1.bn.weight torch.Size([192]) 192 model.22.cv3.2.1.bn.bias torch.Size([192]) 192 model.22.cv3.2.2.weight [80, 192, 1, 1] 15360 model.22.cv3.2.2.bias torch.Size([80]) 80 model.22.dfl.conv.weight [1, 16, 1, 1] 16 -------------------------------------------------------------------------------- Total params: 25902640 Epochs: 10 Batch size: 16 Optimizer: SGD Cosine learning rate: False Dropout: False IoU: 0.7 Augment: False Learning rate 0: 0.01 Learning rate f: 0.01
show_model("yolov8m", 100)
Layer (type) Output Shape Param # -------------------------------------------------------------------------------- model.0.conv.weight [48, 3, 3, 3] 1296 model.0.bn.weight torch.Size([48]) 48 model.0.bn.bias torch.Size([48]) 48 model.1.conv.weight [96, 48, 3, 3] 41472 model.1.bn.weight torch.Size([96]) 96 model.1.bn.bias torch.Size([96]) 96 model.2.cv1.conv.weight [96, 96, 1, 1] 9216 model.2.cv1.bn.weight torch.Size([96]) 96 model.2.cv1.bn.bias torch.Size([96]) 96 model.2.cv2.conv.weight [96, 192, 1, 1] 18432 model.2.cv2.bn.weight torch.Size([96]) 96 model.2.cv2.bn.bias torch.Size([96]) 96 model.2.m.0.cv1.conv.weight [48, 48, 3, 3] 20736 model.2.m.0.cv1.bn.weight torch.Size([48]) 48 model.2.m.0.cv1.bn.bias torch.Size([48]) 48 model.2.m.0.cv2.conv.weight [48, 48, 3, 3] 20736 model.2.m.0.cv2.bn.weight torch.Size([48]) 48 model.2.m.0.cv2.bn.bias torch.Size([48]) 48 model.2.m.1.cv1.conv.weight [48, 48, 3, 3] 20736 model.2.m.1.cv1.bn.weight torch.Size([48]) 48 model.2.m.1.cv1.bn.bias torch.Size([48]) 48 model.15.m.0.cv1.conv.weight [96, 96, 3, 3] 82944 model.15.m.0.cv1.bn.weight torch.Size([96]) 96 model.15.m.0.cv1.bn.bias torch.Size([96]) 96 model.15.m.0.cv2.conv.weight [96, 96, 3, 3] 82944 model.15.m.0.cv2.bn.weight torch.Size([96]) 96 model.15.m.0.cv2.bn.bias torch.Size([96]) 96 model.15.m.1.cv1.conv.weight [96, 96, 3, 3] 82944 model.15.m.1.cv1.bn.weight torch.Size([96]) 96 model.15.m.1.cv1.bn.bias torch.Size([96]) 96 model.15.m.1.cv2.conv.weight [96, 96, 3, 3] 82944 model.15.m.1.cv2.bn.weight torch.Size([96]) 96 model.15.m.1.cv2.bn.bias torch.Size([96]) 96 model.16.conv.weight [192, 192, 3, 3] 331776 model.16.bn.weight torch.Size([192]) 192 model.16.bn.bias torch.Size([192]) 192 model.18.cv1.conv.weight [384, 576, 1, 1] 221184 model.18.cv1.bn.weight torch.Size([384]) 384 model.18.cv1.bn.bias torch.Size([384]) 384 model.18.cv2.conv.weight [384, 768, 1, 1] 294912 model.18.cv2.bn.weight torch.Size([384]) 384 model.18.cv2.bn.bias torch.Size([384]) 384 model.18.m.0.cv1.conv.weight [192, 192, 3, 3] 331776 model.18.m.0.cv1.bn.weight torch.Size([192]) 192 model.18.m.0.cv1.bn.bias torch.Size([192]) 192 model.18.m.0.cv2.conv.weight [192, 192, 3, 3] 331776 model.18.m.0.cv2.bn.weight torch.Size([192]) 192 model.18.m.0.cv2.bn.bias torch.Size([192]) 192 model.18.m.1.cv1.conv.weight [192, 192, 3, 3] 331776 model.18.m.1.cv1.bn.weight torch.Size([192]) 192 model.18.m.1.cv1.bn.bias torch.Size([192]) 192 model.18.m.1.cv2.conv.weight [192, 192, 3, 3] 331776 model.18.m.1.cv2.bn.weight torch.Size([192]) 192 model.18.m.1.cv2.bn.bias torch.Size([192]) 192 model.19.conv.weight [384, 384, 3, 3] 1327104 model.19.bn.weight torch.Size([384]) 384 model.19.bn.bias torch.Size([384]) 384 model.21.cv1.conv.weight [576, 960, 1, 1] 552960 model.21.cv1.bn.weight torch.Size([576]) 576 model.21.cv1.bn.bias torch.Size([576]) 576 model.21.cv2.conv.weight [576, 1152, 1, 1] 663552 model.21.cv2.bn.weight torch.Size([576]) 576 model.21.cv2.bn.bias torch.Size([576]) 576 model.21.m.0.cv1.conv.weight [288, 288, 3, 3] 746496 model.21.m.0.cv1.bn.weight torch.Size([288]) 288 model.21.m.0.cv1.bn.bias torch.Size([288]) 288 model.21.m.0.cv2.conv.weight [288, 288, 3, 3] 746496 model.21.m.0.cv2.bn.weight torch.Size([288]) 288 model.21.m.0.cv2.bn.bias torch.Size([288]) 288 model.21.m.1.cv1.conv.weight [288, 288, 3, 3] 746496 model.21.m.1.cv1.bn.weight torch.Size([288]) 288 model.21.m.1.cv1.bn.bias torch.Size([288]) 288 model.21.m.1.cv2.conv.weight [288, 288, 3, 3] 746496 model.21.m.1.cv2.bn.weight torch.Size([288]) 288 model.21.m.1.cv2.bn.bias torch.Size([288]) 288 model.22.cv2.0.0.conv.weight [64, 192, 3, 3] 110592 model.22.cv2.0.0.bn.weight torch.Size([64]) 64 model.22.cv2.0.0.bn.bias torch.Size([64]) 64 model.22.cv2.0.1.conv.weight [64, 64, 3, 3] 36864 model.22.cv2.0.1.bn.weight torch.Size([64]) 64 model.22.cv2.0.1.bn.bias torch.Size([64]) 64 model.22.cv2.0.2.weight [64, 64, 1, 1] 4096 model.22.cv2.0.2.bias torch.Size([64]) 64 model.22.cv2.1.0.conv.weight [64, 384, 3, 3] 221184 model.22.cv2.1.0.bn.weight torch.Size([64]) 64 model.22.cv2.1.0.bn.bias torch.Size([64]) 64 model.22.cv2.1.1.conv.weight [64, 64, 3, 3] 36864 model.22.cv2.1.1.bn.weight torch.Size([64]) 64 model.22.cv2.1.1.bn.bias torch.Size([64]) 64 model.22.cv2.1.2.weight [64, 64, 1, 1] 4096 model.22.cv2.1.2.bias torch.Size([64]) 64 model.22.cv2.2.0.conv.weight [64, 576, 3, 3] 331776 model.22.cv2.2.0.bn.weight torch.Size([64]) 64 model.22.cv2.2.0.bn.bias torch.Size([64]) 64 model.22.cv2.2.1.conv.weight [64, 64, 3, 3] 36864 model.22.cv2.2.1.bn.weight torch.Size([64]) 64 model.22.cv2.2.1.bn.bias torch.Size([64]) 64 model.22.cv2.2.2.weight [64, 64, 1, 1] 4096 model.22.cv2.2.2.bias torch.Size([64]) 64 model.22.cv3.0.0.conv.weight [192, 192, 3, 3] 331776 model.22.cv3.0.0.bn.weight torch.Size([192]) 192 model.22.cv3.0.0.bn.bias torch.Size([192]) 192 model.22.cv3.0.1.conv.weight [192, 192, 3, 3] 331776 model.22.cv3.0.1.bn.weight torch.Size([192]) 192 model.22.cv3.0.1.bn.bias torch.Size([192]) 192 model.22.cv3.0.2.weight [80, 192, 1, 1] 15360 model.22.cv3.0.2.bias torch.Size([80]) 80 model.22.cv3.1.0.conv.weight [192, 384, 3, 3] 663552 model.22.cv3.1.0.bn.weight torch.Size([192]) 192 model.22.cv3.1.0.bn.bias torch.Size([192]) 192 model.22.cv3.1.1.conv.weight [192, 192, 3, 3] 331776 model.22.cv3.1.1.bn.weight torch.Size([192]) 192 model.22.cv3.1.1.bn.bias torch.Size([192]) 192 model.22.cv3.1.2.weight [80, 192, 1, 1] 15360 model.22.cv3.1.2.bias torch.Size([80]) 80 model.22.cv3.2.0.conv.weight [192, 576, 3, 3] 995328 model.22.cv3.2.0.bn.weight torch.Size([192]) 192 model.22.cv3.2.0.bn.bias torch.Size([192]) 192 model.22.cv3.2.1.conv.weight [192, 192, 3, 3] 331776 model.22.cv3.2.1.bn.weight torch.Size([192]) 192 model.22.cv3.2.1.bn.bias torch.Size([192]) 192 model.22.cv3.2.2.weight [80, 192, 1, 1] 15360 model.22.cv3.2.2.bias torch.Size([80]) 80 model.22.dfl.conv.weight [1, 16, 1, 1] 16 -------------------------------------------------------------------------------- Total params: 25902640 Epochs: 100 Batch size: 16 Optimizer: Adam Cosine learning rate: False Dropout: False IoU: 0.7 Augment: False Learning rate 0: 0.01 Learning rate f: 0.01
Model Training Examples¶
from project.yolo_train import train_model
# Uncomment the line below to train the model ( I have a NVIDIA GeForce GTX 1080 Ti gpu in my house btw (; )
# train_model("yolov8s", 100)
Ultralytics YOLOv8.0.167 Python-3.10.12 torch-2.0.1 CUDA:0 (NVIDIA GeForce GTX 1080 Ti, 11264MiB) engine\trainer: task=detect, mode=train, model=yolov8s.pt, data=project/TACO.yaml, epochs=100, patience=25, batch=16, imgsz=640, save=True, save_period=-1, cache=False, device=None, workers=8, project=None, name=yolov8s_100epochs, exist_ok=False, pretrained=True, optimizer=Adam, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=False, amp=True, fraction=1.0, profile=False, freeze=None, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, show=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, vid_stride=1, stream_buffer=False, line_width=None, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, boxes=True, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, cfg=None, tracker=botsort.yaml, save_dir=runs\detect\yolov8s_100epochs11 Overriding model.yaml nc=80 with nc=60 from n params module arguments 0 -1 1 928 ultralytics.nn.modules.conv.Conv [3, 32, 3, 2] 1 -1 1 18560 ultralytics.nn.modules.conv.Conv [32, 64, 3, 2] 2 -1 1 29056 ultralytics.nn.modules.block.C2f [64, 64, 1, True] 3 -1 1 73984 ultralytics.nn.modules.conv.Conv [64, 128, 3, 2] 4 -1 2 197632 ultralytics.nn.modules.block.C2f [128, 128, 2, True] 5 -1 1 295424 ultralytics.nn.modules.conv.Conv [128, 256, 3, 2] 6 -1 2 788480 ultralytics.nn.modules.block.C2f [256, 256, 2, True] 7 -1 1 1180672 ultralytics.nn.modules.conv.Conv [256, 512, 3, 2] 8 -1 1 1838080 ultralytics.nn.modules.block.C2f [512, 512, 1, True] 9 -1 1 656896 ultralytics.nn.modules.block.SPPF [512, 512, 5] 10 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 11 [-1, 6] 1 0 ultralytics.nn.modules.conv.Concat [1] 12 -1 1 591360 ultralytics.nn.modules.block.C2f [768, 256, 1] 13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest'] 14 [-1, 4] 1 0 ultralytics.nn.modules.conv.Concat [1] 15 -1 1 148224 ultralytics.nn.modules.block.C2f [384, 128, 1] 16 -1 1 147712 ultralytics.nn.modules.conv.Conv [128, 128, 3, 2] 17 [-1, 12] 1 0 ultralytics.nn.modules.conv.Concat [1] 18 -1 1 493056 ultralytics.nn.modules.block.C2f [384, 256, 1] 19 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2] 20 [-1, 9] 1 0 ultralytics.nn.modules.conv.Concat [1] 21 -1 1 1969152 ultralytics.nn.modules.block.C2f [768, 512, 1] 22 [15, 18, 21] 1 2139268 ultralytics.nn.modules.head.Detect [60, [128, 256, 512]] Model summary: 225 layers, 11158820 parameters, 11158804 gradients Transferred 349/355 items from pretrained weights Freezing layer 'model.22.dfl.conv.weight' AMP: running Automatic Mixed Precision (AMP) checks with YOLOv8n... AMP: checks failed . Anomalies were detected with AMP on your system that may lead to NaN losses or zero-mAP results, so AMP will be disabled during training. train: Scanning C:\Users\Dan\Desktop\mini_project\yolo_dataset\train\labels.cache... 180 images, 0 backgrounds, 0 corrupt: 100%|██████████| 180/180 [00:00<?, ?it/s] train: WARNING C:\Users\Dan\Desktop\mini_project\yolo_dataset\train\images\img_33.jpg: corrupt JPEG restored and saved train: WARNING C:\Users\Dan\Desktop\mini_project\yolo_dataset\train\images\img_34.jpg: corrupt JPEG restored and saved train: WARNING C:\Users\Dan\Desktop\mini_project\yolo_dataset\train\images\img_35.jpg: corrupt JPEG restored and saved train: WARNING C:\Users\Dan\Desktop\mini_project\yolo_dataset\train\images\img_36.jpg: corrupt JPEG restored and saved train: WARNING C:\Users\Dan\Desktop\mini_project\yolo_dataset\train\images\img_37.jpg: corrupt JPEG restored and saved val: Scanning C:\Users\Dan\Desktop\mini_project\yolo_dataset\valid\labels.cache... 125 images, 0 backgrounds, 0 corrupt: 100%|██████████| 125/125 [00:00<?, ?it/s] val: WARNING C:\Users\Dan\Desktop\mini_project\yolo_dataset\valid\images\img_24.jpg: corrupt JPEG restored and saved val: WARNING C:\Users\Dan\Desktop\mini_project\yolo_dataset\valid\images\img_27.jpg: corrupt JPEG restored and saved val: WARNING C:\Users\Dan\Desktop\mini_project\yolo_dataset\valid\images\img_28.jpg: corrupt JPEG restored and saved val: WARNING C:\Users\Dan\Desktop\mini_project\yolo_dataset\valid\images\img_29.jpg: corrupt JPEG restored and saved val: WARNING C:\Users\Dan\Desktop\mini_project\yolo_dataset\valid\images\img_31.jpg: corrupt JPEG restored and saved val: WARNING C:\Users\Dan\Desktop\mini_project\yolo_dataset\valid\images\img_32.jpg: corrupt JPEG restored and saved val: WARNING C:\Users\Dan\Desktop\mini_project\yolo_dataset\valid\images\img_33.jpg: corrupt JPEG restored and saved val: WARNING C:\Users\Dan\Desktop\mini_project\yolo_dataset\valid\images\img_34.jpg: corrupt JPEG restored and saved val: WARNING C:\Users\Dan\Desktop\mini_project\yolo_dataset\valid\images\img_35.jpg: corrupt JPEG restored and saved val: WARNING C:\Users\Dan\Desktop\mini_project\yolo_dataset\valid\images\img_40.jpg: corrupt JPEG restored and saved Plotting labels to runs\detect\yolov8s_100epochs11\labels.jpg... optimizer: Adam(lr=0.01, momentum=0.937) with parameter groups 57 weight(decay=0.0), 64 weight(decay=0.0005), 63 bias(decay=0.0) Image sizes 640 train, 640 val Using 8 dataloader workers Logging results to runs\detect\yolov8s_100epochs11 Starting training for 100 epochs... Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size 1/100 7.28G 1.619 9.711 1.432 27 640: 58%|█████▊ | 7/12 [00:03<00:02, 1.75it/s]
--------------------------------------------------------------------------- KeyboardInterrupt Traceback (most recent call last) Cell In[1], line 2 1 from project.yolo_train import train_model ----> 2 train_model("yolov8s", 100) File c:\Users\Dan\Desktop\mini_project\project\yolo_train.py:70, in train_model(model_name, epochs) 67 model = YOLO(f"{model_name}.pt").to(device=device) 69 # Use the model ---> 70 model.train( 71 data="project/TACO.yaml", 72 epochs=epochs, 73 patience=25, 74 imgsz=640, 75 name=f"{model_name}_{epochs}epochs", 76 pretrained=True, 77 optimizer="Adam", 78 verbose=True 79 ) 81 metrics = model.val() File c:\Users\Dan\scoop\apps\mambaforge\current\envs\ml\lib\site-packages\ultralytics\engine\model.py:341, in Model.train(self, trainer, **kwargs) 339 self.model = self.trainer.model 340 self.trainer.hub_session = self.session # attach optional HUB session --> 341 self.trainer.train() 342 # Update model and cfg after training 343 if RANK in (-1, 0): File c:\Users\Dan\scoop\apps\mambaforge\current\envs\ml\lib\site-packages\ultralytics\engine\trainer.py:196, in BaseTrainer.train(self) 193 ddp_cleanup(self, str(file)) 195 else: --> 196 self._do_train(world_size) File c:\Users\Dan\scoop\apps\mambaforge\current\envs\ml\lib\site-packages\ultralytics\engine\trainer.py:349, in BaseTrainer._do_train(self, world_size) 347 with torch.cuda.amp.autocast(self.amp): 348 batch = self.preprocess_batch(batch) --> 349 self.loss, self.loss_items = self.model(batch) 350 if RANK != -1: 351 self.loss *= world_size File c:\Users\Dan\scoop\apps\mambaforge\current\envs\ml\lib\site-packages\torch\nn\modules\module.py:1501, in Module._call_impl(self, *args, **kwargs) 1496 # If we don't have any hooks, we want to skip the rest of the logic in 1497 # this function, and just call forward. 1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1499 or _global_backward_pre_hooks or _global_backward_hooks 1500 or _global_forward_hooks or _global_forward_pre_hooks): -> 1501 return forward_call(*args, **kwargs) 1502 # Do not call functions when jit is used 1503 full_backward_hooks, non_full_backward_hooks = [], [] File c:\Users\Dan\scoop\apps\mambaforge\current\envs\ml\lib\site-packages\ultralytics\nn\tasks.py:44, in BaseModel.forward(self, x, *args, **kwargs) 33 """ 34 Forward pass of the model on a single scale. 35 Wrapper for `_forward_once` method. (...) 41 (torch.Tensor): The output of the network. 42 """ 43 if isinstance(x, dict): # for cases of training and validating while training. ---> 44 return self.loss(x, *args, **kwargs) 45 return self.predict(x, *args, **kwargs) File c:\Users\Dan\scoop\apps\mambaforge\current\envs\ml\lib\site-packages\ultralytics\nn\tasks.py:215, in BaseModel.loss(self, batch, preds) 212 self.criterion = self.init_criterion() 214 preds = self.forward(batch['img']) if preds is None else preds --> 215 return self.criterion(preds, batch) File c:\Users\Dan\scoop\apps\mambaforge\current\envs\ml\lib\site-packages\ultralytics\utils\loss.py:181, in v8DetectionLoss.__call__(self, preds, batch) 178 # pboxes 179 pred_bboxes = self.bbox_decode(anchor_points, pred_distri) # xyxy, (b, h*w, 4) --> 181 _, target_bboxes, target_scores, fg_mask, _ = self.assigner( 182 pred_scores.detach().sigmoid(), (pred_bboxes.detach() * stride_tensor).type(gt_bboxes.dtype), 183 anchor_points * stride_tensor, gt_labels, gt_bboxes, mask_gt) 185 target_scores_sum = max(target_scores.sum(), 1) 187 # cls loss 188 # loss[1] = self.varifocal_loss(pred_scores, target_scores, target_labels) / target_scores_sum # VFL way File c:\Users\Dan\scoop\apps\mambaforge\current\envs\ml\lib\site-packages\torch\nn\modules\module.py:1501, in Module._call_impl(self, *args, **kwargs) 1496 # If we don't have any hooks, we want to skip the rest of the logic in 1497 # this function, and just call forward. 1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1499 or _global_backward_pre_hooks or _global_backward_hooks 1500 or _global_forward_hooks or _global_forward_pre_hooks): -> 1501 return forward_call(*args, **kwargs) 1502 # Do not call functions when jit is used 1503 full_backward_hooks, non_full_backward_hooks = [], [] File c:\Users\Dan\scoop\apps\mambaforge\current\envs\ml\lib\site-packages\torch\utils\_contextlib.py:115, in context_decorator.<locals>.decorate_context(*args, **kwargs) 112 @functools.wraps(func) 113 def decorate_context(*args, **kwargs): 114 with ctx_factory(): --> 115 return func(*args, **kwargs) File c:\Users\Dan\scoop\apps\mambaforge\current\envs\ml\lib\site-packages\ultralytics\utils\tal.py:115, in TaskAlignedAssigner.forward(self, pd_scores, pd_bboxes, anc_points, gt_labels, gt_bboxes, mask_gt) 110 device = gt_bboxes.device 111 return (torch.full_like(pd_scores[..., 0], self.bg_idx).to(device), torch.zeros_like(pd_bboxes).to(device), 112 torch.zeros_like(pd_scores).to(device), torch.zeros_like(pd_scores[..., 0]).to(device), 113 torch.zeros_like(pd_scores[..., 0]).to(device)) --> 115 mask_pos, align_metric, overlaps = self.get_pos_mask(pd_scores, pd_bboxes, gt_labels, gt_bboxes, anc_points, 116 mask_gt) 118 target_gt_idx, fg_mask, mask_pos = select_highest_overlaps(mask_pos, overlaps, self.n_max_boxes) 120 # Assigned target File c:\Users\Dan\scoop\apps\mambaforge\current\envs\ml\lib\site-packages\ultralytics\utils\tal.py:138, in TaskAlignedAssigner.get_pos_mask(self, pd_scores, pd_bboxes, gt_labels, gt_bboxes, anc_points, mask_gt) 136 align_metric, overlaps = self.get_box_metrics(pd_scores, pd_bboxes, gt_labels, gt_bboxes, mask_in_gts * mask_gt) 137 # Get topk_metric mask, (b, max_num_obj, h*w) --> 138 mask_topk = self.select_topk_candidates(align_metric, topk_mask=mask_gt.expand(-1, -1, self.topk).bool()) 139 # Merge all mask to a final mask, (b, max_num_obj, h*w) 140 mask_pos = mask_topk * mask_in_gts * mask_gt File c:\Users\Dan\scoop\apps\mambaforge\current\envs\ml\lib\site-packages\ultralytics\utils\tal.py:194, in TaskAlignedAssigner.select_topk_candidates(self, metrics, largest, topk_mask) 191 ones = torch.ones_like(topk_idxs[:, :, :1], dtype=torch.int8, device=topk_idxs.device) 192 for k in range(self.topk): 193 # Expand topk_idxs for each value of k and add 1 at the specified positions --> 194 count_tensor.scatter_add_(-1, topk_idxs[:, :, k:k + 1], ones) 195 # count_tensor.scatter_add_(-1, topk_idxs, torch.ones_like(topk_idxs, dtype=torch.int8, device=topk_idxs.device)) 196 # filter invalid bboxes 197 count_tensor.masked_fill_(count_tensor > 1, 0) KeyboardInterrupt:
Model Predictions Examples¶
import matplotlib.pyplot as plt
from PIL import Image
def plot_image(image_path):
# Open an image
img = Image.open(image_path)
# Display the image
plt.imshow(img)
plt.axis('off') # Hide axes
plt.show()
yolov8s 100 epochs:¶
plot_image("runs/detect/predict/img_1.jpg")
yolov8n 100 epochs:¶
plot_image("runs/detect/predict/img_2.jpg")
plot_image("runs/detect/predict/img_3.jpg")
yolov8m 10 epochs:¶
plot_image("runs/detect/predict/img_4.jpg")
plot_image("runs/detect/predict/img_5.jpg")
model training results:¶
from project.yolo_train import plot_results
plot_results('runs/detect/train/yolov8m_10epochs/results.csv')
plot_results('runs/detect/train/yolov8m_100epochs/results.csv')
plot_results('runs/detect/train/yolov8n_10epochs/results.csv')
plot_results('runs/detect/train/yolov8n_100epochs/results.csv')
plot_results('runs/detect/train/yolov8s_100epochs/results.csv')
plot_image('runs/detect/train/yolov8m_10epochs/results.png')
plot_image('runs/detect/train/yolov8m_100epochs/results.png')
plot_image('runs/detect/train/yolov8n_10epochs/results.png')
plot_image('runs/detect/train/yolov8n_100epochs/results.png')
plot_image('runs/detect/train/yolov8s_100epochs/results.png')
plot_image('imgs/without_aug_and_cp.jpg')
plot_image('imgs/with_aug_and_cp.jpg')
Analysis and explanations¶
Loss functions:¶
Our YOLOv8 uses VFL Loss as classification loss and DFL Loss+CIOU Loss as classification loss; 6. Sample matching . YOLOv8 abandoned the previous IOU matching or unilateral ratio allocation, but used the Task-Aligned Assigner matching method
1. VFL Loss:¶
VFL (Varifocal Loss) is introduced to address the class imbalance issue during object detection. In object detection tasks, the majority of the anchors are negative (i.e., not containing an object). This leads to a class imbalance where the positive samples (those containing objects) are heavily outnumbered. VFL dynamically adjusts the weights of the positive and negative samples based on the model's prediction confidence. In simple terms, VFL gives more weight to those samples that the model is uncertain about, forcing the model to pay more attention to them.
2. DFL Loss:¶
While you mentioned "DFL Loss+CIOU Loss as classification loss", it seems there might be a mistake in categorization since DFL (Distribution-based Focal Loss) and CIOU Loss are not purely for classification. DFL focuses on addressing the ambiguity in assigning a ground truth bounding box to a predicted box by considering a distribution-based approach. This can help improve the accuracy of bounding box predictions.
3. CIOU Loss:¶
CIOU (Complete Intersection over Union) Loss is an enhancement over the traditional IoU (Intersection over Union) metric used in object detection. In addition to the overlap between predicted and ground truth boxes, CIOU also takes into account the distance between the center points of the boxes and the aspect ratio differences. This leads to better bounding box regression, especially in cases where there are overlapping objects.
Batch Size Considerations - The selection of batch size plays a pivotal role when training YOLO (You Only Look Once) models. This is because it has implications for both the training pace and the ultimate precision of the model.
For starters, adopting larger batch sizes tends to expedite the training regimen. This is because processing several data samples simultaneously reduces the frequency of weight adjustments. However, there's a caveat: larger batches demand more computational memory. This means that, depending on your hardware, maximizing the batch size might not always be a practical choice.
Furthermore, batch size isn't just about speed; it also has a bearing on model precision. The YOLO loss function uniquely melds both localization and classification elements. Each element's gradient is calculated independently. When the batch size is minimized, these gradients might exhibit inconsistencies, potentially hampering convergence and, subsequently, model accuracy. Conversely, if the batch size is overly extensive, there's a risk of the model becoming too tailored to that specific batch, undermining its capacity to generalize and thus affecting validation accuracy.
In our experiments, we toggled with diverse batch sizes. A pressing challenge surfaced when batch sizes exceeded 4, as this strained our GPU memory capacity. To navigate this bottleneck, we tweaked the image resolutions and downsized our model. By transitioning to a more compact iteration of YOLO-V8, we managed to comfortably work with batch sizes greater than 4.
Choice of Optimizer - Initially, we leaned towards the Adam optimizer, a renowned and conventional choice, for several reasons:
- Swift Convergence: Adam typically achieves convergence more rapidly compared to many other optimization techniques like stochastic gradient descent (SGD). This is largely attributed to its adaptive learning rate mechanism.
- Dynamic Learning Rate: Adam distinguishes itself by adjusting the learning rate for every individual weight. This is done based on the assessed variance and mean of the gradients, facilitating quicker convergence.
- Resilience to Noisy Gradients: Adam displays an admirable resilience to noisy gradients, a challenge often faced in expansive deep learning architectures.
- Modest Memory Footprint: Another merit of Adam is its relatively low memory consumption, making it an ideal choice for extensive deep learning models.
However, upon evaluation, the outcomes left room for improvement. This led us to transition to the SGD optimizer, and the ensuing results were notably better. The SGD optimizer operates by amending model weights in alignment with the negative gradient of the loss function. This adjustment is based on a selected subset of training data, termed a mini-batch. Though often, the Adam optimizer is perceived to be superior to SGD, our experimentation painted a different picture, with SGD outperforming Adam.
In the final phase, we further refined our model by tuning the hyperparameters of the SGD optimizer to ensure optimal performance.
Let's delve into the YOLOv8 model's outcomes:
Initially, our baseline model exhibited underwhelming performance. Even though there was evidence of the loss converging, it wasn't at an optimal speed. Investigating the subpar results led us to identify several issues, which we addressed in our subsequent model iterations.
Our examination revealed an overly high learning rate, even with the employment of the Adam optimizer. We discovered that the SGD optimizer was more effective, yielding improved outcomes. Moreover, the original batch size introduced excessive variability in model updates.
In our subsequent iteration, we pivoted to the SGD optimizer, adopted a larger batch size, and moderated the learning rate. Recognizing the intricacies of our dataset — with minute objects present in a range of contexts — we implemented augmentations. Augmentations, given the diversity they introduce, can push the model towards developing better feature maps. This fosters improved object detection as the model gets acquainted with diverse image variations, enhancing its generalization.
The outcome was evident in the improved loss and mAP metrics observed in the second model. However, a plateau was noticeable post the 20th epoch. To navigate this, we veered away from the Adam optimizer, which previously underperformed. Instead, we integrated the SGD optimizer alongside a cosine annealing learning rate scheduler—a widely-acknowledged scheduler for its efficacy. The resulting model showcased a reduced loss, enhanced mAP, and sustained learning across epochs.
Certain iterations, like the 4th model, manifested signs of overfitting. A probable cause was the omission of augmentations, driving the model to overly tailor itself to training data and compromising its generalization capabilities. By reincorporating augmentations in the 5th model and judiciously selecting hyperparameters post extensive testing, we arrived at a considerably more proficient model.
In the visual representations, we also factored in the mAP across various categories. For instance, 'Plastic film' showcased a high mAP due to its abundant representation. Conversely, categories with sparse instances recorded diminished mAP values—a testament to the dataset's inherent imbalance we highlighted earlier.
Addressing GPU Memory Challenges¶
During our model training on the dataset, we faced a bottleneck with GPU memory, as it quickly depleted. We harnessed several cutting-edge techniques to address this:
Transfer Learning - Earlier, we highlighted the advantages of transfer learning for deep neural networks. One less-mentioned benefit is its potential to conserve GPU memory. A pre-trained model inherently has discerned numerous data patterns, enabling it to extract many pertinent features for a novel task. Using such a model as a feature extractor mitigates the necessity of initiating a complete training cycle, which can be resource-intensive and significantly tax GPU memory. Instead, we processed our input through this pre-trained model, garnered the features, and subsequently trained a more memory-conservative model using these features.
Our gravitation towards transfer learning was two-fold: the evident GPU memory constraints and its renowned efficacy for efficient model training. Using YOLOv8, we selectively froze gradient updates except in the final three layers of the decoder, the classification segment, and the bounding box regression component. This tactic was aimed at leveraging the model's pre-trained weights, which had already established an efficient feature space, ensuring resource conservation and promising outcomes.
A deep dive to pinpoint the memory consumption indicated that the GPU was reaching its limits during the backward pass. This suggested it was struggling to accommodate all the gradients. To counter this, we tested several techniques:
Reducing Batch Size - An attempt was made to minimize the batch size. However, this was only viable with a batch size of one to complete a full epoch without memory saturation. Such a minuscule batch size brings along its set of challenges:
Erratic Convergence: Tiny batch sizes introduce gradient estimations with high variance, jeopardizing model convergence during training.
Extended Training Duration: A smaller batch size could hinder the convergence rate, prolonging the requisite training time. The increased batch frequency implies more passes through the network, elevating the time expense.
Memory Mismanagement: Training using smaller batches can be counterproductive in terms of memory as the frequent gradient-based model updates can strain resources.
Compromised Generalization: Minimized batch sizes might induce overfitting, causing the model to be excessively tuned to training data and consequently underperform on novel data.
Thus, merely trimming the batch size wasn't an optimal resolution.
Gradient Accumulation - Implementing this technique allowed us to amass gradients over multiple smaller batches prior to a model parameter update. Rather than conducting an update post every data batch, the gradients are stored in the GPU memory, culminating after processing a designated number of batches. These collective gradients then dictate a singular, consolidated model parameter update. This approach essentially emulates the effects of a larger batch size, fostering swift convergence and enhanced outcomes. While the loss convergence depicted marked improvement with this method, it introduced a new predicament: every epoch's duration drastically increased, exceeding an hour.
Consequently, we found ourselves on the lookout for alternative solutions to expedite the model training process.